External Sorting on Flash Memory Via Natural Page Run Generation

نویسندگان

  • Yang Liu
  • Zhen He
  • Yi-Ping Phoebe Chen
  • Thi Nguyen
چکیده

The increasing popularity of flash memory means more database systems will run on flash memory in the future. One of the most important database operations is the external sort. Hence, this paper is focused on studying the problem of efficient external sorting on flash memory. In contrast to most previous work, we target the situation where previously sorted data has become progressively un-sorted due to data updates. Accordingly, we call this ”partially” sorted data. We focus on re-sorting partially sorted data by taking advantage of the partial sorted nature of the data to speed up the run generation phase of the traditional external merge sort. We do this by finding ”naturally occurring” page runs in the partially sorted data. Our algorithm can perform up to a factor of 1024 less write IO compared to a traditional external merge sort during the run generation phase. Wemap the problem of finding naturally occurring runs into the shortest distance problem in a directed acyclic graph (DAG). Accordingly, we propose an optimal solution to the problem using the well known DAG-Shortest-Paths algorithm. However, we found the optimal solution was too slow for even moderate sized data sets and accordingly propose a fast heuristic solution which we experimentally show finds a high percentage of page runs using a minimum of computational overhead. Experiments using both real and synthetic data sets show our heuristic algorithm can halve the external sorting time when compared to three likely competing external sorting algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient External Sorting on Flash Memory Embedded Devices

Many embedded system applications involve storing and querying large datasets. Existing research in this area has focused on adapting and applying conventional database algorithms to embedded devices. Algorithms designed for processing queries on embedded devices must be able to execute given the small amount of available memory and energy constraints. Most embedded devices use flash memory to ...

متن کامل

PatTrieSort - External String Sorting based on Patricia Tries

External merge sort belongs to the most efficient and widely used algorithms to sort big data: As much data as fits inside is sorted in main memory and afterwards swapped to external storage as so called initial run. After sorting all the data in this way block-wise, the initial runs are merged in a merging phase in order to retrieve the final sorted run containing the completely sorted origina...

متن کامل

Designing Database Operators for Flash-enabled Memory Hierarchies

Flash memory affects not only storage options but also query processing. In this paper, we analyze the use of flash memory for database query processing, including algorithms that combine flash memory and traditional disk drives. We first focus on flash-resident databases and present data structures and algorithms that leverage the fast random reads of flash to speed up selection, projection, a...

متن کامل

FlashVM: Virtual Memory Management on Flash

With the decreasing price of flash memory, systems will increasingly use solid-state storage for virtual-memory paging rather than disks. FlashVM is a system architecture and a core virtual memory subsystem built in the Linux kernel that uses dedicated flash for paging. FlashVM focuses on three major design goals for memory management on flash: high performance, reduced flash wear out for impro...

متن کامل

The Devil Is in the Details: Implementing Flash Page Reuse with WOM Codes

Flash memory is prevalent in modern servers and devices. Coupled with the scaling down of flash technology, the popularity of flash memory motivates the search for methods to increase flash reliability and lifetime. Erasures are the dominant cause of flash cell wear, but reducing them is challenging because flash is a write-once medium— memory cells must be erased prior to writing. An approach ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. J.

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2011